Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
Drug Resist Updat ; 74: 101083, 2024 May.
Article in English | MEDLINE | ID: mdl-38593500

ABSTRACT

AIMS: Carbapenem-resistant Klebsiella pneumonia (CRKP) is a global threat that varies by region. The global distribution, evolution, and clinical implications of the ST11 CRKP clone remain obscure. METHODS: We conducted a multicenter molecular epidemiological survey using isolates obtained from 28 provinces and municipalities across China between 2011 and 2021. We integrated sequences from public databases and performed genetic epidemiology analysis of ST11 CRKP. RESULTS: Among ST11 CRKP, KL64 serotypes exhibited considerable expansion, increasing from 1.54% to 46.08% between 2011 and 2021. Combining our data with public databases, the phylogenetic and phylogeography analyses indicated that ST11 CRKP appeared in the Americas in 1996 and spread worldwide, with key clones progressing from China's southeastern coast to the inland by 2010. Global phylogenetic analysis showed that ST11 KL64 CRKP has evolved to a virulent, resistant clade with notable regional spread. Single-nucleotide polymorphism (SNP) analysis identified BMPPS (bmr3, mltC, pyrB, ppsC, and sdaC) as a key marker for this clade. The BMPPS SNP clade is associated with high mortality and has strong anti-phagocytic and competitive traits in vitro. CONCLUSIONS: The high-risk ST11 KL64 CRKP subclone showed strong expansion potential and survival advantages, probably owing to genetic factors.


Subject(s)
Anti-Bacterial Agents , Klebsiella Infections , Klebsiella pneumoniae , Phylogeny , Humans , China/epidemiology , Klebsiella pneumoniae/genetics , Klebsiella pneumoniae/drug effects , Klebsiella pneumoniae/isolation & purification , Klebsiella Infections/epidemiology , Klebsiella Infections/microbiology , Klebsiella Infections/transmission , Klebsiella Infections/drug therapy , Anti-Bacterial Agents/pharmacology , Polymorphism, Single Nucleotide , Carbapenem-Resistant Enterobacteriaceae/genetics , Carbapenem-Resistant Enterobacteriaceae/drug effects , Carbapenem-Resistant Enterobacteriaceae/isolation & purification , Molecular Epidemiology , Carbapenems/pharmacology , Microbial Sensitivity Tests , Phylogeography , Serogroup , Genomics/methods
2.
Nat Microbiol ; 9(3): 595-613, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38347104

ABSTRACT

Microbial breakdown of organic matter is one of the most important processes on Earth, yet the controls of decomposition are poorly understood. Here we track 36 terrestrial human cadavers in three locations and show that a phylogenetically distinct, interdomain microbial network assembles during decomposition despite selection effects of location, climate and season. We generated a metagenome-assembled genome library from cadaver-associated soils and integrated it with metabolomics data to identify links between taxonomy and function. This universal network of microbial decomposers is characterized by cross-feeding to metabolize labile decomposition products. The key bacterial and fungal decomposers are rare across non-decomposition environments and appear unique to the breakdown of terrestrial decaying flesh, including humans, swine, mice and cattle, with insects as likely important vectors for dispersal. The observed lockstep of microbial interactions further underlies a robust microbial forensic tool with the potential to aid predictions of the time since death.


Subject(s)
Microbial Consortia , Soil Microbiology , Mice , Humans , Animals , Swine , Cattle , Cadaver , Metagenome , Bacteria
3.
J Biomed Inform ; 152: 104615, 2024 04.
Article in English | MEDLINE | ID: mdl-38423266

ABSTRACT

OBJECTIVE: Sepsis is one of the most serious hospital conditions associated with high mortality. Sepsis is the result of a dysregulated immune response to infection that can lead to multiple organ dysfunction and death. Due to the wide variability in the causes of sepsis, clinical presentation, and the recovery trajectories, identifying sepsis sub-phenotypes is crucial to advance our understanding of sepsis characterization, to choose targeted treatments and optimal timing of interventions, and to improve prognostication. Prior studies have described different sub-phenotypes of sepsis using organ-specific characteristics. These studies applied clustering algorithms to electronic health records (EHRs) to identify disease sub-phenotypes. However, prior approaches did not capture temporal information and made uncertain assumptions about the relationships among the sub-phenotypes for clustering procedures. METHODS: We developed a time-aware soft clustering algorithm guided by clinical variables to identify sepsis sub-phenotypes using data available in the EHR. RESULTS: We identified six novel sepsis hybrid sub-phenotypes and evaluated them for medical plausibility. In addition, we built an early-warning sepsis prediction model using logistic regression. CONCLUSION: Our results suggest that these novel sepsis hybrid sub-phenotypes are promising to provide more accurate information on sepsis-related organ dysfunction and sepsis recovery trajectories which can be important to inform management decisions and sepsis prognosis.


Subject(s)
Electronic Health Records , Sepsis , Humans , Algorithms , Phenotype , Cluster Analysis , Sepsis/diagnosis
4.
Nat Commun ; 15(1): 67, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38167298

ABSTRACT

The acquisition of exogenous mobile genetic material imposes an adaptive burden on bacteria, whereas the adaptational evolution of virulence plasmids upon entry into carbapenem-resistant Klebsiella pneumoniae (CRKP) and its impact remains unclear. To better understand the virulence in CRKP, we characterize virulence plasmids utilizing a large genomic data containing 1219 K. pneumoniae from our long-term surveillance and publicly accessible databases. Phylogenetic evaluation unveils associations between distinct virulence plasmids and serotypes. The sub-lineage ST11-KL64 CRKP acquires a pK2044-like virulence plasmid from ST23-KL1 hypervirulent K. pneumoniae, with a 2698 bp region deletion in all ST11-KL64. The deletion is observed to regulate methionine metabolism, enhance antioxidant capacity, and further improve survival of hypervirulent CRKP in macrophages. The pK2044-like virulence plasmid discards certain sequences to enhance survival of ST11-KL64, thereby conferring an evolutionary advantage. This work contributes to multifaceted understanding of virulence and provides insight into potential causes behind low fitness costs observed in bacteria.


Subject(s)
Antioxidants , Carbapenem-Resistant Enterobacteriaceae , Klebsiella pneumoniae/genetics , Phylogeny , Acclimatization , Carbapenem-Resistant Enterobacteriaceae/genetics , Carbapenems/pharmacology , Anti-Bacterial Agents/pharmacology
5.
Neural Comput ; 36(1): 107-127, 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-38052079

ABSTRACT

This letter considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study used functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the data sets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from 275 (individuals) × 2 (fMRI and dMRI) × 246 (ROIs) × 246 (ROIs) to 275 (individuals) × 2 (fMRI and dMRI) × 6 (clusters) × 6 (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into six clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting data set was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse.


Subject(s)
Cocaine , Connectome , Humans , Connectome/methods , Magnetic Resonance Imaging/methods , Multimodal Imaging , Machine Learning
6.
medRxiv ; 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38076824

ABSTRACT

Alzheimer's disease (AD) is influenced by a variety of modifiable risk factors, including a person's dietary habits. While the ketogenic diet (KD) holds promise in reducing metabolic risks and potentially affecting AD progression, only a few studies have explored KD's metabolic impact, especially on blood and cerebrospinal fluid (CSF). Our study involved participants at risk for AD, either cognitively normal or with mild cognitive impairment. The participants consumed both a modified Mediterranean-ketogenic diet (MMKD) and the American Heart Association diet (AHAD) for 6 weeks each, separated by a 6-week washout period. We employed nuclear magnetic resonance (NMR)-based metabolomics to profile serum and CSF and metagenomics profiling on fecal samples. While the AHAD induced no notable metabolic changes, MMKD led to significant alterations in both serum and CSF. These changes included improved modifiable risk factors, like increased HDL-C and reduced BMI, reversed serum metabolic disturbances linked to AD such as a microbiome-mediated increase in valine levels, and a reduction in systemic inflammation. Additionally, the MMKD was linked to increased amino acid levels in the CSF, a breakdown of branched-chain amino acids (BCAAs), and decreased valine levels. Importantly, we observed a strong correlation between metabolic changes in the CSF and serum, suggesting a systemic regulation of metabolism. Our findings highlight that MMKD can improve AD-related risk factors, reverse some metabolic disturbances associated with AD, and align metabolic changes across the blood-CSF barrier.

7.
Drug Alcohol Depend ; 251: 110923, 2023 Oct 01.
Article in English | MEDLINE | ID: mdl-37598454

ABSTRACT

BACKGROUND: Illicit stimulant use remains a public health concern that has been associated with multiple adverse outcomes, including cognitive deficits. The effects of stimulant use on cognition may be particularly deleterious in persons with HIV. Stimulant use intensity may be an important factor in the magnitude of observed deficits over time. METHODS: We completed neurocognitive testing in a sample of people who use stimulants with (n = 84) and without HIV (n = 123) at baseline and up to 4 follow-up time points over approximately 1 year. Participants reported on substance use at each visit, including frequency of use and stimulant dependence. Mixed effects models examined the relationship between stimulant-related factors and neurocognitive function over time. RESULTS: Participants were mostly male (57%), African American (86%), and 47.41 years old on average. All participants actively used stimulants at enrollment and use remained prevalent throughout the follow-up period, with an average of ≥24 days of use in the past 90 days at all time points. Retention was excellent, with 86% completing all 4 follow-up assessments. Mixed effects models showed that stimulant dependence was associated with lower neurocognitive performance independent of HIV status (p = 0.002), whereas frequency of use had a greater negative impact on performance in participants with HIV compared to those without HIV (p = 0.045). CONCLUSIONS: Our key finding is that stimulant-related factors are associated with neurocognitive performance over time, but in complex ways. These findings have important implications for harm reduction approaches, particularly those that target cognitive function.

8.
Nat Aging ; 3(7): 776-790, 2023 07.
Article in English | MEDLINE | ID: mdl-37400722

ABSTRACT

Cellular senescence is a well-established driver of aging and age-related diseases. There are many challenges to mapping senescent cells in tissues such as the absence of specific markers and their relatively low abundance and vast heterogeneity. Single-cell technologies have allowed unprecedented characterization of senescence; however, many methodologies fail to provide spatial insights. The spatial component is essential, as senescent cells communicate with neighboring cells, impacting their function and the composition of extracellular space. The Cellular Senescence Network (SenNet), a National Institutes of Health (NIH) Common Fund initiative, aims to map senescent cells across the lifespan of humans and mice. Here, we provide a comprehensive review of the existing and emerging methodologies for spatial imaging and their application toward mapping senescent cells. Moreover, we discuss the limitations and challenges inherent to each technology. We argue that the development of spatially resolved methods is essential toward the goal of attaining an atlas of senescent cells.


Subject(s)
Aging , Cellular Senescence , United States , Humans , Animals , Mice , Longevity
9.
AMIA Jt Summits Transl Sci Proc ; 2023: 291-299, 2023.
Article in English | MEDLINE | ID: mdl-37350882

ABSTRACT

Electronic Health Record (EHR) data are captured over time as patients receive care. Accordingly, variations among patients, such as when a patient presents for care during the course of a disease, introduce bias into standard longitudinal EHR data analysis methods. We, therefore, aim to provide an alignment method that reduces this bias. We structure this task as a registration problem. While limited prior research on longitudinal EHR data considered registration, we propose a robust registration method to provide better data alignment by estimating the optimum time shift at each time point. We validate the proposed method for mortality prediction. We utilize a Recurrent Neural Network (RNN), time-varying Cox regression model, and Logistic Regression (LR) for mortality prediction. Results suggest our proposed registration method enhances mortality prediction with at least a 1-2% increase in major evaluation metrics utilized.

10.
Eur Radiol ; 33(8): 5779-5791, 2023 Aug.
Article in English | MEDLINE | ID: mdl-36894753

ABSTRACT

OBJECTIVE: To develop and evaluate task-based radiomic features extracted from the mesenteric-portal axis for prediction of survival and response to neoadjuvant therapy in patients with pancreatic ductal adenocarcinoma (PDAC). METHODS: Consecutive patients with PDAC who underwent surgery after neoadjuvant therapy from two academic hospitals between December 2012 and June 2018 were retrospectively included. Two radiologists performed a volumetric segmentation of PDAC and mesenteric-portal axis (MPA) using a segmentation software on CT scans before (CTtp0) and after (CTtp1) neoadjuvant therapy. Segmentation masks were resampled into uniform 0.625-mm voxels to develop task-based morphologic features (n = 57). These features aimed to assess MPA shape, MPA narrowing, changes in shape and diameter between CTtp0 and CTtp1, and length of MPA segment affected by the tumor. A Kaplan-Meier curve was generated to estimate the survival function. To identify reliable radiomic features associated with survival, a Cox proportional hazards model was used. Features with an ICC ≥ 0.80 were used as candidate variables, with clinical features included a priori. RESULTS: In total, 107 patients (60 men) were included. The median survival time was 895 days (95% CI: 717, 1061). Three task-based shape radiomic features (Eccentricity mean tp0, Area minimum value tp1, and Ratio 2 minor tp1) were selected. The model showed an integrated AUC of 0.72 for prediction of survival. The hazard ratio for the Area minimum value tp1 feature was 1.78 (p = 0.02) and 0.48 for the Ratio 2 minor tp1 feature (p = 0.002). CONCLUSION: Preliminary results suggest that task-based shape radiomic features can predict survival in PDAC patients. KEY POINTS: • In a retrospective study of 107 patients who underwent neoadjuvant therapy followed by surgery for PDAC, task-based shape radiomic features were extracted and analyzed from the mesenteric-portal axis. • A Cox proportional hazards model that included three selected radiomic features plus clinical information showed an integrated AUC of 0.72 for prediction of survival, and a better fit compared to the model with only clinical information.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Male , Humans , Retrospective Studies , Pancreatic Neoplasms/diagnostic imaging , Pancreatic Neoplasms/therapy , Carcinoma, Pancreatic Ductal/diagnostic imaging , Carcinoma, Pancreatic Ductal/therapy , Tomography, X-Ray Computed/methods , Pancreatic Neoplasms
11.
Microbiol Spectr ; 10(6): e0240022, 2022 12 21.
Article in English | MEDLINE | ID: mdl-36222687

ABSTRACT

Carbapenem-resistant hypervirulent Klebsiella pneumoniae (CR-hvKP) is recognized as a threat worldwide, but the mechanisms underlying its emergence remain unclear. As most CR-hvKP isolates are not hypermucoviscous, we speculated that the evolution of the capsule might result in the convergence of carbapenem resistance and hypervirulence. Here, 2,096 K. pneumoniae isolates were retrospectively collected to screen the ST23-K1 clone, and hypervirulence was roughly defined as being highly resistant to serum killing. The effect of wcaJ on the capsule, virulence, fitness, and resistance acquisition was further analyzed. The capsule gene wcaJ, inserted by ISKpn26/ISKpn74, was identified via whole-genome sequencing in four hvKP, but not hypermucoviscous, isolates. Uronic acid quantitation results revealed that these isolates produced significantly less capsular polysaccharides than NTUH-K2044. A significant increase in capsular production was observed in wcaJ-complemented isolates and confirmed by transmission electron microscopy. Further, all wcaJ-complemented isolates acquired greater resistance to macrophage phagocytosis, and one representative isolate resulted in a significantly higher mortality rate than the parental isolate in mice, indicating that wcaJ inactivation might compromise virulence. However, isolates with wcaJ interruption demonstrated a lower fitness cost and a high conjugation frequency of the blaKPC-2 plasmid, raising concerns about the emergence of carbapenem resistance in hvKP. IMPORTANCE Klebsiella pneumoniae is one of the most common nosocomial pathogens worldwide, and we speculated that the evolution of the capsule might result in the convergence of carbapenem resistance and hypervirulence of K. pneumoniae. The wcaJ gene was first reported to be interrupted by insertion sequence elements in ST23-K1 hypervirulent Klebsiella pneumoniae, resulting in little capsule synthesis, which plays an important role in virulence. We examined the effect of wcaJ on the capsule, virulence, and fitness. Isolates with wcaJ interruption might compromise virulence and demonstrated a lower fitness cost and a high conjugation frequency of the blaKPC-2 plasmid, highlighting its role as a potential factor facilitating hypervirulence and carbapenem resistance.


Subject(s)
DNA Transposable Elements , Klebsiella Infections , Animals , Mice , Virulence/genetics , Klebsiella pneumoniae , Retrospective Studies , Plasmids/genetics , Carbapenems/pharmacology , Anti-Bacterial Agents/pharmacology
12.
IEEE Trans Inf Theory ; 68(6): 3991-4019, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36274655

ABSTRACT

This paper studies a general framework for high-order tensor SVD. We propose a new computationally efficient algorithm, tensor-train orthogonal iteration (TTOI), that aims to estimate the low tensor-train rank structure from the noisy high-order tensor observation. The proposed TTOI consists of initialization via TT-SVD [1] and new iterative backward/forward updates. We develop the general upper bound on estimation error for TTOI with the support of several new representation lemmas on tensor matricizations. By developing a matching information-theoretic lower bound, we also prove that TTOI achieves the minimax optimality under the spiked tensor model. The merits of the proposed TTOI are illustrated through applications to estimation and dimension reduction of high-order Markov processes, numerical studies, and a real data example on New York City taxi travel records. The software of the proposed algorithm is available online (https://github.com/Lili-Zheng-stat/TTOI).

13.
IEEE Trans Inf Theory ; 68(9): 5975-6002, 2022 Sep.
Article in English | MEDLINE | ID: mdl-36865503

ABSTRACT

We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model - an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.

14.
Antibiotics (Basel) ; 10(12)2021 Nov 26.
Article in English | MEDLINE | ID: mdl-34943669

ABSTRACT

Rapid and accurate detection can help optimize patient treatment and improve infection control against nosocomial carbapenemase-producing organisms (CPO). In this study, a total of 217 routine clinical isolates (Enterobacterales and A. baumannii), including 178 CPOs and 39 non-CPOs, were tested to evaluate the performance of six phenotypic carbapenemase detection and classification assays, i.e., BD Phoenix CPO detect panel, Rapidec Carba-NP, O.K.N detection kit, and three carbapenem inactivation methods (CIMs; mCIM, eCIM, sCIM). The overall detection sensitivity and specificity were 98.78% (95.21-99.79%) and 79.49% (63.06-90.13%), respectively, for the BD phoenix CPO P/N test; 91.93% (86.30-95.45%) and 100% (88.83-100%), respectively, for the Rapidec Carba-NP; 98.06% (94.00-99.50%) and 97.44% (84.92-99.87%), respectively, for mCIM; and 96.89% (92.52-98.85%) and 94.87% (81.37-99.11%), respectively, for sCIM. The classification sensitivity and specificity for the BD phoenix CPO Ambler test, the O.K.N detection kit, and the mCIM and eCIM were 56.71% (48.75-64.34%) and 94.87% (81.37-99.11%), 99.28% (95.43-99.96%) and 100% (88.83-100%), and 92.90% (87.35-96.23%) and 97.44% (84.92-99.87%), respectively. All detection assays were reliable in detecting carbapenemase. However, the Rapidec Carba-NP and mCIM were insufficient in detecting OXA-48-like enzymes. The BD phoenix CPO detect panel had a strong ability to detect carbapenemase but failed to classify 48/59 (81.36%) KPC, 8/52 (15.38%) NDM, 8/22 (36.36%) OXA-23-like, and 6/11 (54.55%) dual enzymes. The O.K.N detection kit accurately detected and differentiated KPC, NDM, and OXA-48-like enzymes existing alone or in combination. The results of this study will support reliable laboratory work tools and promote therapeutic and infection control decisions.

15.
J Neurosurg Pediatr ; 28(5): 533-543, 2021 Aug 13.
Article in English | MEDLINE | ID: mdl-34388710

ABSTRACT

OBJECTIVE: Postoperative hydrocephalus occurs in one-third of children after posterior fossa tumor resection. Although models to predict the need for CSF diversion after resection exist for preoperative variables, it is unknown which postoperative variables predict the need for CSF diversion. In this study, the authors sought to determine the clinical and radiographic predictors for CSF diversion in children following posterior fossa tumor resection. METHODS: This was a retrospective cohort study involving patients ≤ 18 years of age who underwent resection of a primary posterior fossa tumor between 2000 and 2018. The primary outcome was the need for CSF diversion 6 months after surgery. Candidate predictors for CSF diversion including age, race, sex, frontal occipital horn ratio (FOHR), tumor type, tumor volume and location, transependymal edema, papilledema, presence of postoperative intraventricular blood, and residual tumor were evaluated using a best subset selection method with logistic regression. RESULTS: Of the 63 included patients, 26 (41.3%) had CSF diversion at 6 months. Patients who required CSF diversion had a higher median FOHR (0.5 vs 0.4) and a higher percentage of postoperative intraventricular blood (30.8% vs 2.7%) compared with those who did not. A 0.1-unit increase in FOHR or intraventricular blood was associated with increased odds of CSF diversion (OR 2.9 [95% CI 1.3-7.8], p = 0.02 and OR 20.2 [95% CI 2.9-423.1], p = 0.01, respectively) with an overfitting-corrected concordance index of 0.68 (95% CI 0.56-0.80). CONCLUSIONS: The preoperative FOHR and postoperative intraventricular blood were significant predictors of the need for permanent CSF diversion within 6 months after posterior fossa tumor resection in children.


Subject(s)
Hydrocephalus/cerebrospinal fluid , Hydrocephalus/diagnosis , Infratentorial Neoplasms/surgery , Child , Child, Preschool , Female , Humans , Hydrocephalus/complications , Infratentorial Neoplasms/complications , Lateral Ventricles/blood supply , Male , Postoperative Complications/surgery , Retrospective Studies , Third Ventricle/blood supply , Treatment Outcome
16.
Ultramicroscopy ; 219: 113123, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33032160

ABSTRACT

Tensor singular value decomposition (SVD) is a method to find a low-dimensional representation of data with meaningful structure in three or more dimensions. Tensor SVD has been applied to denoise atomic-resolution 4D scanning transmission electron microscopy (4D STEM) data. On data simulated from a SrTiO3 [100] perfect crystal and a Si [110] edge dislocation, tensor SVD achieved an average peak signal-to-noise ratio (PSNR) of ~40 dB, which matches or exceeds the performance of other denoising methods, with processing times at least 100 times shorter. On experimental data from SrTiO3 [100] and LiZnSb [112¯0]/GaSb [110] samples, tensor SVD denoises multiple GB 4D STEM data sets in ten minutes on a typical personal computer. Denoising with tensor SVD improves both convergent beam electron diffraction patterns and virtual-aperture annular dark field images.

17.
IEEE Trans Inf Theory ; 66(5): 3202-3231, 2020 May.
Article in English | MEDLINE | ID: mdl-33746242

ABSTRACT

Model reduction of Markov processes is a basic problem in modeling state-transition systems. Motivated by the state aggregation approach rooted in control theory, we study the statistical state compression of a discrete-state Markov chain from empirical trajectories. Through the lens of spectral decomposition, we study the rank and features of Markov processes, as well as properties like representability, aggregability, and lumpability. We develop spectral methods for estimating the transition matrix of a low-rank Markov model, estimating the leading subspace spanned by Markov features, and recovering latent structures like state aggregation and lumpable partition of the state space. We prove statistical upper bounds for the estimation errors and nearly matching minimax lower bounds. Numerical studies are performed on synthetic data and a dataset of New York City taxi trips.

18.
IEEE Trans Inf Theory ; 66(9): 5927-5964, 2020 Sep.
Article in English | MEDLINE | ID: mdl-33746244

ABSTRACT

In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings. A two-stage non-convex implementation is developed based on sparse tensor decomposition and thresholded gradient descent, which ensures exact recovery in the noiseless case and stable recovery in the noisy case with high probability. The non-asymptotic analysis sheds light on an interplay between optimization error and statistical error. The proposed procedure is shown to be rate-optimal under certain conditions. As a technical by-product, novel high-order concentration inequalities are derived for studying high-moment sub-Gaussian tensors. An interesting tensor formulation illustrates the potential application to high-order interaction pursuit in high-dimensional linear regression.

19.
Nucleic Acids Res ; 47(18): e111, 2019 10 10.
Article in English | MEDLINE | ID: mdl-31372654

ABSTRACT

A key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , RNA/genetics , Single-Cell Analysis/methods , Software , Algorithms , Gene Expression Profiling , Gene Expression Regulation/genetics , Models, Statistical , Sequence Analysis, RNA/methods
20.
J Am Stat Assoc ; 114(528): 1708-1725, 2019.
Article in English | MEDLINE | ID: mdl-34290464

ABSTRACT

In this article, we consider the sparse tensor singular value decomposition, which aims for dimension reduction on high-dimensional high-order data with certain sparsity structure. A method named sparse tensor alternating thresholding for singular value decomposition (STAT-SVD) is proposed. The proposed procedure features a novel double projection & thresholding scheme, which provides a sharp criterion for thresholding in each iteration. Compared with regular tensor SVD model, STAT-SVD permits more robust estimation under weaker assumptions. Both the upper and lower bounds for estimation accuracy are developed. The proposed procedure is shown to be minimax rate-optimal in a general class of situations. Simulation studies show that STAT-SVD performs well under a variety of configurations. We also illustrate the merits of the proposed procedure on a longitudinal tensor dataset on European country mortality rates. Supplementary materials for this article are available online.

SELECTION OF CITATIONS
SEARCH DETAIL
...